AITopics

Technology: Information Technology > Artificial Intelligence (0.78)

Neural Information Processing SystemsDec-23-2025, 19:22:24 GMT

A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics

Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a top-down generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties. BSP models each of these entities as random variables, and uses Bayesian inference to estimate their unknown properties.

bayesian-symbolic approach, physics, reasoning and learning, (8 more...)

Technology: Information Technology > Artificial Intelligence (0.97)

arXiv.org Artificial IntelligenceDec-9-2025

Opinion: Learning Intuitive Physics May Require More than Visual Data

Su, Ellen, Legris, Solim, Gureckis, Todd M., Ren, Mengye

Humans expertly navigate the world by building rich internal models founded on an intuitive understanding of physics. Meanwhile, despite training on vast quantities of internet video data, state-of-the-art deep learning models still fall short of human-level performance on intuitive physics benchmarks. This work investigates whether data distribution, rather than volume, is the key to learning these principles. We pretrain a Video Joint Embedding Predictive Architecture (V-JEPA) model on SAYCam, a developmentally realistic, egocentric video dataset partially capturing three children's everyday visual experiences. We find that training on this dataset, which represents 0.01% of the data volume used to train SOTA models, does not lead to significant performance improvements on the IntPhys2 benchmark. Our results suggest that merely training on a developmentally realistic dataset is insufficient for current architectures to learn representations that support intuitive physics. We conclude that varying visual data volume and distribution alone may not be sufficient for building systems with artificial intuitive physics.

artificial intelligence, deep learning, machine learning, (17 more...)

2512.06232

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

arXiv.org Artificial IntelligenceDec-2-2025

Towards aligned body representations in vision models

Gizdov, Andrey, Procopio, Andrea, Li, Yichen, Harari, Daniel, Ullman, Tomer

Human physical reasoning relies on internal "body" representations -- coarse, volumetric approximations that capture an object's extent and support intuitive predictions about motion and physics. While psychophysical evidence suggests humans use such coarse representations, their internal structure remains largely unknown. Here we test whether vision models trained for segmentation develop comparable representations. We adapt a psychophysical experiment conducted with 50 human participants to a semantic segmentation task and test a family of seven segmentation networks, varying in size. We find that smaller models naturally form human-like coarse body representations, whereas larger models tend toward overly detailed, fine-grain encodings. Our results demonstrate that coarse representations can emerge under limited computational resources, and that machine representations can provide a scalable path toward understanding the structure of physical reasoning in the brain.

artificial intelligence, machine learning, representation, (18 more...)

2512.00365

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsNov-21-2025, 15:18:46 GMT

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 50K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods. We also demonstrate that active data collection using the learned model further improves performance.

experiential learning, learning, name change, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Neural Information Processing SystemsMay-26-2025, 15:51:55 GMT

A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics

Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a top-down generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties.

artificial intelligence, bayesian-symbolic approach, physics, (5 more...)

Technology: Information Technology > Artificial Intelligence (0.81)

arXiv.org Artificial IntelligenceMay-20-2025

A simulation-heuristics dual-process model for intuitive physics

Li, Shiqian, Ma, Yuxi, Yan, Jiajun, Dai, Bo, Peng, Yujia, Zhang, Chi, Zhu, Yixin

The role of mental simulation in human physical reasoning is widely acknowledged, but whether it is employed across scenarios with varying simulation costs and where its boundary lies remains unclear. Using a pouring-marble task, our human study revealed two distinct error patterns when predicting pouring angles, differentiated by simulation time. While mental simulation accurately captured human judgments in simpler scenarios, a linear heuristic model better matched human predictions when simulation time exceeded a certain boundary. Motivated by these observations, we propose a dual-process framework, Simulation-Heuristics Model (SHM), where intuitive physics employs simulation for short-time simulation but switches to heuristics when simulation becomes costly. By integrating computational methods previously viewed as separate into a unified model, SHM quantitatively captures their switching mechanism. The SHM aligns more precisely with human behavior and demonstrates consistent predictive performance across diverse scenarios, advancing our understanding of the adaptive nature of intuitive physical reasoning.

artificial intelligence, machine learning, physics, (16 more...)

2504.09546

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

arXiv.org Artificial IntelligenceFeb-17-2025

Intuitive physics understanding emerges from self-supervised pretraining on natural videos

Garrido, Quentin, Ballas, Nicolas, Assran, Mahmoud, Bardes, Adrien, Najman, Laurent, Rabbat, Michael, Dupoux, Emmanuel, LeCun, Yann

We investigate the emergence of intuitive physics understanding in general-purpose deep neural network models trained to predict masked regions in natural videos. Leveraging the violation-of-expectation framework, we find that video prediction models trained to predict outcomes in a learned representation space demonstrate an understanding of various intuitive physics properties, such as object permanence and shape consistency. In contrast, video prediction in pixel space and multimodal large language models, which reason through text, achieve performance closer to chance. Our comparisons of these architectures reveal that jointly learning an abstract representation space while predicting missing parts of sensory input, akin to predictive coding, is sufficient to acquire an understanding of intuitive physics, and that even models trained on one week of unique video achieve above chance performance. This challenges the idea that core knowledge -- a set of innate systems to help understand the world -- needs to be hardwired to develop an understanding of intuitive physics.

large language model, machine learning, natural language, (21 more...)

2502.11831

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Neural Information Processing SystemsFeb-11-2025, 20:15:27 GMT

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 50K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot's interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model.

experiential learning, intuitive physics, learning, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)

Neural Information Processing SystemsJan-27-2025, 20:03:08 GMT

Reviews: Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Many studies have looked at the ideas of physics simulation as a cognitive model. In such works, physics engines are usually employed as a model of human cognition of physical tasks, with the perception part of the task is often abstracted away. In parallel, data driven model have been frequently used to learn to parse raw visual inputs to detect or locate objects, frequently without using any explicit model of the physical world. This paper tries to bridge these two fields to build a complete model of how humans perceive certain physical scenarios, from raw pixels to expectations over objects. Whereas all of the parts employed in the proposed "pipeline" are based on previous works, their arrangement into this contiguous framework is new, as is the human and modeled results on the new dataset the authors also present.

coarse probabilistic object representation, intuitive physics, modeling expectation violation, (5 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.74)